Problem-Independent Architectures
Attention-based Neural Cellular Automata
Recent extensions of Cellular Automata (CA) have incorporated key ideas from modern deep learning, dramatically extending their capabilities and catalyzing a new family of Neural Cellular Automata (NCA) techniques. Inspired by Transformer-based architectures, our work presents a new class of attention-based NCAs formed using a spatially localized--yet globally organized--self-attention scheme. We introduce an instance of this class named Vision Transformer Cellular Automata (ViTCA).
AdaNCA: Neural Cellular Automata as Adaptors for More Robust Vision Transformer
Vision Transformers (ViTs) demonstrate remarkable performance in image classification through visual-token interaction learning, particularly when equipped with local information via region attention or convolutions. Although such architectures improve the feature aggregation from different granularities, they often fail to contribute to the robustness of the networks. Neural Cellular Automata (NCA) enables the modeling of global visual-token representations through local interactions, as its training strategies and architecture design confer strong generalization ability and robustness against noisy input. In this paper, we propose Adaptor Neural Cellular Automata (AdaNCA) for Vision Transformers that uses NCA as plug-and-play adaptors between ViT layers, thus enhancing ViT's performance and robustness against adversarial samples as well as out-of-distribution inputs. To overcome the large computational overhead of standard NCAs, we propose Dynamic Interaction for more efficient interaction learning. Using our analysis of AdaNCA placement and robustness improvement, we also develop an algorithm for identifying the most effective insertion points for AdaNCA. With less than a 3% increase in parameters, AdaNCA contributes to more than 10% of absolute improvement in accuracy under adversarial attacks on the ImageNet1K benchmark. Moreover, we demonstrate with extensive evaluations across eight robustness benchmarks and four ViT architectures that AdaNCA, as a plug-and-play module, consistently improves the robustness of ViTs.
Stochastic Variational Deep Kernel Learning
Deep kernel learning combines the non-parametric flexibility of kernel methods with the inductive biases of deep learning architectures. We propose a novel deep kernel learning model and stochastic variational inference procedure which generalizes deep kernel learning approaches to enable classification, multi-task learning, additive covariance structures, and stochastic gradient training. Specifically, we apply additive base kernels to subsets of output features from deep neural architectures, and jointly learn the parameters of the base kernels and deep network through a Gaussian process marginal likelihood objective. Within this framework, we derive an efficient form of stochastic variational inference which leverages local kernel interpolation, inducing points, and structure exploiting algebra. We show improved performance over stand alone deep networks, SVMs, and state of the art scalable Gaussian processes on several classification benchmarks, including an airline delay dataset containing 6 million training points, CIFAR, and ImageNet.
Unsupervised Graph Neural Architecture Search with Disentangled Self-supervision
The existing graph neural architecture search (GNAS) methods heavily rely on supervised labels during the search process, failing to handle ubiquitous scenarios where supervisions are not available. In this paper, we study the problem of unsupervised graph neural architecture search, which remains unexplored in the literature. The key problem is to discover the latent graph factors that drive the formation of graph data as well as the underlying relations between the factors and the optimal neural architectures. Handling this problem is challenging given that the latent graph factors together with architectures are highly entangled due to the nature of the graph and the complexity of the neural architecture search process. To address the challenge, we propose a novel Disentangled Self-supervised Graph Neural Architecture Search (DSGAS) model, which is able to discover the optimal architectures capturing various latent graph factors in a self-supervised fashion based on unlabeled graph data. Specifically, we first design a disentangled graph super-network capable of incorporating multiple architectures with factor-wise disentanglement, which are optimized simultaneously. Then, we estimate the performance of architectures under different factors by our proposed self-supervised training with joint architecture-graph disentanglement. Finally, we propose a contrastive search with architecture augmentations to discover architectures with factor-specific expertise. Extensive experiments on 11 real-world datasets demonstrate that the proposed DSGAS model is able to achieve state-ofthe-art performance against several baseline methods in an unsupervised manner.
Neural Architecture Dilation for Adversarial Robustness (Supplementary Material) Yanxi Li1
For the dilation architecture, we use a DAG with 4 nodes as the supernetwork. There are 8 operation candidates for each edges, including 4 convolutional operations: 3 3 separable convolutions, 5 5 separable convolutions, 3 3 dilated separable convolutions and 5 5 dilated separable convolutions, 2 pooling operations: 3 3 average pooling and 3 3 max pooling, and two special operations: an identity operation representing skip-connection and a zero operation representing two nodes are not connected. During dilating, we stack 3 cells for each of the 3 blocks in the WRN34-10. During retraining, the number is increased to 6. The dilated architectures designed by NADAR are as shown in Figure 1.
Appendix for Multi-task Graph Neural Architecture Search with Task-aware Collaboration and Curriculum
An operation w Model weight ฮฑ The architecture parameter N The number of chunks ฮธ The trainable parameter in the soft task-collaborative module p The parameter generated by Eq.(9) p The parameter generated by Eq.(11), replacing p during curriculum training ฮด The parameter to control graph structure diversity ฮณ The parameter to control task-wise curriculum training BNRist is the abbreviation of Beijing National Research Center for Information Science and Technology. Here we provide the detailed derivation process of Eq.(10). Then we use Eq.(9) to substitute We consider a search space of standard layer-by-layer architectures without sophisticated connections such as residual or jumping connections, though our proposed method can be easily generalized. We choose five widely used message-passing GNN layers as our operation candidate set O, including GCN [4], GAT [9], GIN [10], SAGE [2], k-GNN [5], and ARMA [3]. Besides, we also adopt MLP, which does not consider graph structures.
Hierarchical Neural Architecture Search for Deep Stereo Matching - Supplementary Materials
In this supplemental material, we briefly introduce three widely used stereo matching benchmarks, provide details of the separate-search ( 3.3 of the main manuscript) of the Feature Net and the Matching Net, and show more qualitative results of our method on various datasets and screenshots of benchmarks. KITTI 2012 and 2015 datasets These two datasets are both real-world datasets collected from a driving car. KITTI 2012 contains 194 training image pairs and 195 test image pairs. KITTI 2015 contains 200 stereo pairs for training and 200 for testing. The typical resolution of KITTI images is 376 1240. For KITTI 2012, the semi-dense ground truth disparity maps are generated by Velodyne HDL64E LiDARs, while for KITTI 2015, 3D CAD models for cars are manually inserted [1].
AdaNCA: Neural Cellular Automata as Adaptors for More Robust Vision Transformer
Vision Transformers (ViTs) demonstrate remarkable performance in image classification through visual-token interaction learning, particularly when equipped with local information via region attention or convolutions. Although such architectures improve the feature aggregation from different granularities, they often fail to contribute to the robustness of the networks. Neural Cellular Automata (NCA) enables the modeling of global visual-token representations through local interactions, as its training strategies and architecture design confer strong generalization ability and robustness against noisy input. In this paper, we propose Adaptor Neural Cellular Automata (AdaNCA) for Vision Transformers that uses NCA as plug-and-play adaptors between ViT layers, thus enhancing ViT's performance and robustness against adversarial samples as well as out-of-distribution inputs. To overcome the large computational overhead of standard NCAs, we propose Dynamic Interaction for more efficient interaction learning. Using our analysis of AdaNCA placement and robustness improvement, we also develop an algorithm for identifying the most effective insertion points for AdaNCA. With less than a 3% increase in parameters, AdaNCA contributes to more than 10% of absolute improvement in accuracy under adversarial attacks on the ImageNet1K benchmark. Moreover, we demonstrate with extensive evaluations across eight robustness benchmarks and four ViT architectures that AdaNCA, as a plug-and-play module, consistently improves the robustness of ViTs.
MOTE-NAS: Multi-Objective Training-based Estimate for Efficient Neural Architecture Search
Neural Architecture Search (NAS) methods seek effective optimization toward performance metrics regarding model accuracy and generalization while facing challenges regarding search costs and GPU resources. Recent Neural Tangent Kernel (NTK) NAS methods achieve remarkable search efficiency based on a training-free model estimate; however, they overlook the non-convex nature of the DNNs in the search process. In this paper, we develop Multi-Objective Training-based Estimate (MOTE) for efficient NAS, retaining search effectiveness and achieving the new state-of-the-art in the accuracy and cost trade-off. To improve NTK and inspired by the Training Speed Estimation (TSE) method, MOTE is designed to model the actual performance of DNNs from macro to micro perspective by draw loss landscape and convergence speed simultaneously. Using two reduction strategies, the MOTE is generated based on a reduced architecture and a reduced dataset.